Operational Mental Models

💡 This post is a collection of personal thought processes and mental models I use to guide capability development for offensive R&D. Might be helpful as food for thought.

⏸ Fair warning that this is a bit of a dense read. I encourage you to read through it all, but it doesn't have to be done in the same sitting. Take your time and don't feel rushed.


  1. Introduction
  2. Dynamic Playbooks
  3. The Operation Analytics Cube
  4. Symmetrical Task Framing
  5. Discussion and Summary


Two months ago, I published my mental decision tree for approaching EDR sensor evasion (click for full version). While designing it, it dawned on me that the combination of the decision points and their prioritization felt like they represented a signature or fingerprint of my own tradecraft.

Subsequently, I wondered the following:

  1. In analyzing my own decision tree, what can I induce from my approach to tradecraft?
  2. Is it possible to infer decision trees of other threat actors?

We can think of a decision tree as just another way to represent the collection of if/else statements within a function. I was curious to learn if a function could be written to "generate" this decision tree and variations of it. Perhaps the parameters to this function could shed light on the constraints and tendencies of the threat actor using the decision tree. If those are correctly inferred, perhaps they can predict an approximation of future behaviour in the short-term.

Dynamic Playbooks


Building on some previous work, I wanted to use parameters or "key drivers" based on the strategic principles from Matt Monte's CNE framework. The collection of these variables—when assigned—form a data type that I would call a "tradecraft policy".

Having some flexibility here can make it possible to have a configurable strategy or "dynamic playbook" which can be used by operators to evaluate available options in a given situation. In a way, this can help tighten the gap between operational strategy and executed TTPs.

Matt's framework consists of six principles:

  1. Knowledge: broad and deep understanding of technology, people, and organizations.
  2. Innovation: creating or adapting technology and methodology to new circumstances.
  3. Awareness: mapping out the operational domain; detecting and monitoring events.
  4. Precaution: minimizing the impact of unwitting actions by the target on an operation.
  5. Operational Security: minimizing exposure, detection, and reaction to an operation.
  6. Program Security: containing damage caused by the compromise of an operation.

The way I started to approach this was to map each principle to a five-point scale, where a range of responses represent the degree to which the principle is or isn't demonstrated. It's important to choose the response wording carefully. Conventional wisdom in this space offers some advice:

  • Aim for wording that the team will interpret in the same way (no ambiguity).
  • Make response options exhaustive and mutually exclusive.
  • Strive for wording that is specific and concrete (as opposed to general and abstract).

Second, once the responses are defined, each option in the decision tree is then ranked objectively against each variable. These should be periodically revised to reflect the current landscape. Lastly, the tradecraft policy is defined which consists of a combination of hard and soft constraints.

Click here to read the five-point scale responses for these variables.
  1. Knowledge: broad and deep understanding of technology, people, and organizations.
    • -2: Option uses existing public knowledge that is widely known.
    • -1: Option uses existing public knowledge that is recent or obscure.
    •  0: Agnostic.
    • +1: Option extends public knowledge with minor novel knowledge.
    • +2: Option currently lacks detailed and public knowledge of technique.

  2. Innovation: creating or adapting technology and methodology to new circumstances.
    • -2: Option uses technology with questionable efficacy.
    • -1: Option uses common technology which should work.
    •  0: Agnostic.
    • +1: Option provides incremental leverage compared to similar capabilities.
    • +2: Option provides novel leverage in a creative way that was not possible before.

  3. Awareness: mapping out the operational domain; detecting and monitoring events.
    • -2: Option deliberately avoids awareness due to operational security concerns.
    • -1: Option leverages passive or min. awareness of situationally-relevant systems.
    •  0: Agnostic.
    • +1: Option promotes proactive awareness of situationally-relevant systems.
    • +2: Option promotes holistic awareness of the environment + recent telemetry.

  4. Precaution: minimizing the impact of unwitting actions by the target on an operation.
    • -2: Option will place existing persistence methods at risk.
    • -1: Option demotes the use of redundant persistence methods.
    •  0: Agnostic.
    • +1: Option promotes the use of redundant persistence methods.
    • +2: Option promotes the use of methods that are both redundant and diverse.

  5. Operational Security: minimizing exposure, detection, and reaction to an operation.
    • -2: Option is detected and prevented immediately using on-sensor logic.
    • -1: Option is detected immediately or emits events that will raise online detection.
    •  0: Agnostic.
    • +1: Option is undetected, but emits suspicious events requiring manual analysis.
    • +2: Option is undetected, with no suspicious events emitted with target's config.

  6. Program Security: containing damage caused by the compromise of an operation.
    • -2: Option, if exposed, will risk burning a capability where no contingency exists.
    • -1: Option, if exposed, will risk burning a capability where there is a contingency.
    •  0: Agnostic.
    • +1: Option, if exposed, will demonstrate a recent or obscure public capability.
    • +2: Option, if exposed, will demonstrate a well-known public capability.

Intuition-building Examples

Before diving into the interactive demo, we can build an intuition for its controls by going over some common meta-examples that tend to happen in operations:

  • Path of least resistance allows operators to make better use of time (when that's a factor), while limiting the exposure of advanced "secret sauce" capabilities to defenders. Put another way, it's a preference for expressing lower levels of knowledge and innovation in order to prioritize common courses of action (CoAs) whenever they are available.

  • Noise amplification is a practice in some red team exercises where after demonstrating a covert path for a scenario, the team will select progressively "noisy" TTPs to measure the blue team's detection threshold. Put another way, the tradecraft policy for the exercise starts with preferring a higher level of OpSec, and during noise amplification, it is gradually shifted to lower levels. Adjusting the policy helps clarify DO's and DONT's for CoAs.

  • Burned capabilities is a reality many operators deal with (especially around initial access) as detection capabilities grow stronger while traditional attack surface reduces. This can typically lead to adjustments in tradecraft policy where constraints are relaxed to tolerate a higher level of innovation and/or a lower level of program security.

  • Conflicting constraints are an inextricable element of offensive operations, where the constraints of accompishing a scenario without getting caught can often be at odds with each other. Sometimes, it may make sense to relax them one way or another when at an inflection point. Other times, it may be more appropriate to specify which constraints take precedence by changing their weights. This also offers some granularity in the difference between weights.

Interactive Demo

Below is a very rudimentary toy example of implementing this dynamic playbook model. Although this is a more basic representation of what it could be, I hope some of you can immediately see the value in it. It is best viewed on desktop.

The "Policy Selector" interface below might be a little different than what you're used to. Keep in mind that there are different ways to design this type of interface and the ranking algorithm, allowing for simpler or more complex expressions of tradecraft policy.

I would suggest spending a few minutes playing around with the variables and observing the output options to get the hang of it and build that intuition for yourself. Before you get started, a few notes on the options:

  1. Engagement types in this model simply refer to different tradecraft policies. As mentioned in my previous talk, I have doubts on whether there will be industry-wide consensus on what different engagement types mean. My take is that labels are not so important, as long as we can clearly and consistently describe the effort in terms of the constraints that have been imposed (i.e. tradecraft policy). Matt Monte's CNE principles provide one framework we can use to express those constraints.

  2. Variables represent degrees of expression. It might be counterintuitive to think about how you can increase or decrease a principle like "Knowledge". You can't suddenly gain or lose knowledge, but you can control the degree to which you demonstrate it. Your TTPs can demonstrate knowledge that is widely known (e.g. LPE via modifiable service binary) or knowledge that is esoteric (e.g. LPE via CPU-specific bugs). This can affect the outcome of an operation while also influencing the perception a defender could have of you through the TTPs you express or don't express.

  3. The multiplier dropdowns can be used to adjust how constraints are prioritized. Sometimes constraints can be at odds with each other. For example, the desire to use innovative capabilities while maintaining a high degree of program security can be in conflict when considering the risk of those capabilities being burned without contingency. Multipliers allow us to express which constraints take precedence.

  4. If a preference value is set to 0 (agnostic), then it will be exempt from score calculations.

🛑 Slow down! This demo may highlight some potentially novel concepts, and if you have scrolled here without reading the prior sections, it's highly recommended to read them to avoid confusion and to build an intuition for using this demo.

Engagement Selector

Policy Selector

Variable Preference Multiplier Scoring Matrix
Knowledge -2 -1 0 +1 +2
Innovation -2 -1 0 +1 +2
Awareness -2 -1 0 +1 +2
Precaution -2 -1 0 +1 +2
OpSec -2 -1 0 +1 +2
ProgSec -2 -1 0 +1 +2

Tactic Selector

Technique DOs

Option Score

Technique DON'Ts

Option Score

Applications and Limitations

Attackers can use this tool to develop a tradecraft policy and prescribe a course of action (CoA) using: the policy, known options, and observations about the target's security posture. I find that prescriptive CoAs help at three levels:

  1. People: Help operators cross-check their intuitions as a way of enhancing (not replacing) the deliberation process with other operators.

  2. Process: Help offensive teams gain consensus on the techniques they should prioritize for a given engagement type, in a more expressive way than relying on intuitions and anecdotes.

  3. Technology: Move a step closer to baking policy and oversight into code through technical implementations of safeguards.

Developers can use dynamic playbooks to measure the relative supply of capabilities. By framing supply in the context of tradecraft policy, this can be used to identify gaps and help justify prioritization of efforts. For example, using the "Red Team Operation" policy above, capabilities that align with more dark green shaded regions of the scoring matrix would be measured to have the most utility under that policy, and therefore could justify higher development priority. Of course, other organizational and political factors would also influence priority, so this isn't a perfect model.

Defenders can invert the offensive application to infer tradecraft policies that can be used to profile and predict CoAs. Inference is possible through knowledge of: CoAs observed over a period of time, known options, and security posture observations. Having such a capability implies an intent to understand what alternative options could have been considered for a given CoA, and what information was available to attackers to estimate CoA efficacy.

While these potential applications are all fascinating to me, there are limitations in the demo above which makes it less than practical for operational usage:

  1. There is no integration of situational awareness data. The demo shows how operators can get a sense of what they generally should or shouldn't do. If there was awareness data, those options can be further filtered by what they can or cannot do.

  2. Measuring OpSec and ProgSec is more nuanced than assigning simple scores. OpSec scores for options depend on what attackers can infer of their target environment, and ProgSec scores depend on what defenders can infer from attackers' capabilities.

  3. Technique-level options do not account for nuances at the procedure-level.

The Operation Analytics Cube

To account for these limitations in a more structured way, I wanted to take a step back and see how this tool would "fit" in a different model, so I can better identify gaps in my thinking and infer directions for offensive capability development. With no exception, the relevant aphorism to preface this model is that "all models are wrong, but some are useful".

If offensive R&D efforts had to be dichotimized, I propose the following streams:
  1. Artillery: The arsenal of procedures which operators execute on targets to facilitate CoAs.
    • Examples: exploits, initial access payloads, post-exploitation tools.

  2. Cognitive: The technology and process to make the best decisions for selecting CoAs.
    • Examples: attack path maps, simulation environments, integrated guidance systems.

In OODA Loop terms, the artillery stream is mostly in the Observe and Act spaces, whereas the cognitive stream is mostly in the Orient and Decide spaces. In reality, these streams shouldn't be viewed as separate black boxes, but should be fused together for a more successful outcome.

With all that said, what is the "Operation Analytics Cube"? It's a way to model efforts in the cognitive stream. It extends TTPs to incorporate the data and analytics required for meaningful decision making. Morphologically thinking about these three dimensions can help develop (or recover) those processes.

It is inspired by the ATT&CK and D3FEND frameworks, OODA loops, and the McCumber cube.

In this model, there are Tactics, Techniques, and Procedures. Each space is described in terms of data, which include: the options available, observations made, and the constraints to shape actor decisions. Those data points flow into various analytic techniques to help answer the following: What has happened? What could happen? And what should happen? (Extending to 5W1H.)

Drawing from OODA loops, the actor (attacker or defender) gathers data and produces analytics from their loop to make better and faster decisions. Recovering the data and analytics relevant to their opponent's loop will further bolster those efforts.

I have found this cube useful because it is a tool that allows me to deduce a list of analytical capabilities from it, as part of a forced association exercise in morphological analysis. For example, the Dynamic Playbook, as represented in the demo above, demonstrates associations between Tactics, Techniques, Options, Constraints, and Prescriptions.

In my view, the further an actor can fuse these attributes together within the cognitive stream and into the artillery stream, the more empowered they will be in making better and faster decisions in selecting the best CoA for a given situation.

Symmetrical Task Framing

The Operation Analytics Cube, OODA loops, and other frameworks are useful in highlighting symmetries in opponent behavior. Yet another lens to frame the cat-and-mouse game between offense and defense is to reframe procedures as a series of generation and discrimination tasks.

The general idea is that rational attackers and defenders ultimately perform tasks that generate or discriminate behaviours, and within both of those tasks are a similar set of elements. (We can imagine all of these tasks residing in a so-called "possibility space", where advantages, frictions, gaps, mistakes, and other "noise" affect which tasks within this space materialize.)

When attackers generate behaviours, it typically results from some level of situational awareness that it is indeed possible to generate the behaviour (observed artifacts). From there, options are validated against a tradecraft policy with room for appropriate exceptions. Exceptions provide flexibility for situations that the tradecraft policy would not properly account for (regardless of how it's modeled). Finally, the relevant procedural logic of the behaviour is executed which can produce artifacts that may be observable.

On the flip side, when defenders discriminate behaviours, it's a slightly similar process. Attacker-produced artifacts are observed and then classified with relevant detection logic. When there is a match, a reaction policy that's also subject to exceptions can determine the next CoA for the defender or its agent. The act of discriminating the behaviour and reacting to it could itself produce artifacts.

While this model doesn't represent a perfect symmetry and can be prone to edge cases, what I find most useful is the systematized recognition of some symmetry and the questions it can raise to stimulate thinking on capability gaps. I'll walk through some non-exhaustive example questions with perspectives from both offensive and defensive sides:

Generation Questions for Defenders

  • Artifacts Observed: What relevant reconnaissance was performed by the attacker? Can this be observed? How did it influence their decision-making process for selecting their CoA?

  • Tradecraft Policy: How much can we infer through observation about the operating principles that are relevant to the attacker? What other constraints influence TTP selection?

  • Decision Exceptions: What was expected but didn't occur (or vice versa)? What can the mismatch in prediction help us infer about the attacker's intent or shifts in policy?

  • Procedural Logic: How does the procedure work at a technical level?

  • Artifacts Produced: What artifacts does it emit? Can they be captured as telemetry?

Discrimination Questions for Attackers

  • Artifacts Observed: We can determine the artifacts produced of our own actions with a high degree of certainty, but how large is the subset that can be observed by defensive agents?

  • Detection Logic: How is the system classifying the activity? Is it a signature or an ML model? Are there relevant gaps in classification or event internalization?

  • Reaction Policy: What constraints shape sense-think-act functions? Is the system configured to react to the behaviour, and if so, will it be passive or active?

  • Decision Exceptions: Which indicators are noisy false positives? Are there opportunities to abuse them by blending in with the noise? Which exclusions are built-in vs. configurable?

  • Artifacts Produced: What artifacts indicate reaction to classification?

The terms "generation" and "discriminative" were deliberately chosen because they are loosely-inspired by Generative Adversarial Networks (GAN). Rob Miles made an awesome video on this which helped me build some intuition for the concept and a nuanced way of seeing how attackers and defenders compete.

So that covers the three mental models I wanted to share that stimulates my thinking for capability R&D. At this point, I'll switch to some brief discussion and a summary at the end.


Offensive R&D: In the enterprise red team space, I tend to notice more efforts put into capabilities in artillery than in cognitive technologies. I don't think one stream is inherently more important than the other. In my view, they are equal and R&D investments should match accordingly.

Adversary Emulation: I am increasingly convinced that hyperfocusing on specific TTPs that experienced attackers were caught using is the wrong approach to these exercises, and doesn't stay true to "emulation" as I understand it. It is too myopic, reductionist, and can promote a false sense of control to some executives on the defensive side.

Threat actor emulation is a complex systems problem and a serious effort requires a level of situational awareness and capability development that is more diverse and holistic. Enterprise red team operators should heavily consider inferences made about a threat actor's broader intentions, their target selection calculus, policies and norms at both organizational and tradecraft levels, and culture. All of these flow into decision-making processes that eventually get expressed as TTPs. In my mind, accurate modeling of their decision-making process and analytics cube is proportional to accurate predictions of their TTPs.

I think I'm preaching to the choir here, but if you disagree as a red teamer, ask yourself this: If another group were to "emulate" your own team, would only technical knowledge of the actions that got you caught suffice as an accurate predictor of your thought process and future behaviour?

Summary and Conclusions

The purpose of the post was to shed light some of the mental models I have been using for capability R&D. With respect to the goal of helping operators make better and faster decisions, these models help me find gaps in existing capabilities and infer directions for future R&D. They are summarized below:

  • Dynamic Playbooks: These can be used to operationalize tradecraft policy into prioritized DO's and DON'Ts for operators. Defenders can also use this in reverse (over time) to recover approximations of tradecraft policies when combined with knowledge of the options an attacker could have considered. Offensive developers can use tradecraft policies to measure relative supply of capabilities and guide effort prioritization.

  • Operation Analytics Cube: This extends TTPs to incorporate the data and analytics required for meaningful decision making. By collecting observations, options, and constraints about TTPs, the data could help answer: What has happened? What could happen? And what should happen? Through forced association exercises, this can be used to identify R&D opportunities in the "cognitive stream".

  • Symmetrical Task Framing: The ability to frame offensive and defensive procedures as a series of generation and discrimination tasks can help us recognize behavioural symmetries at a high-level across various elements. In doing so, this can be used as a tool to stimulate thinking about capability gaps on both sides.

Circling back to the "all models are wrong, but some are useful" aphorism, I think it's especially important for us to always consider an interdisciplinary and multi-paradigm approach to solving problems and determining what our best CoAs should be—whether that's on-target or off-target. It's convenient to take a siloed and reductionist approach especially when more isn't expected of us in our roles, but the world doesn't work this way and is full of complex systems.

Thank you for taking the time to read this, and I hope there is at least something you found useful. If not, please don't feel rushed to understand the models. If others find them more intuitive, that isn't any indication that they should have already clicked for you. What fascinates and continues to motivate me is how much more there is to learn and how little I currently know.

Also a big thank you to everyone who reviewed this!

Related Resources

What started me on this path was imagining how something similar to these models can be used outside of InfoSec contexts. Below are some books I have been slowly reading through in parallel (why?). While they are not directly referenced, they have inspired my line of thinking in this space.